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The international response to SARS-CoV has produced an outstanding number of protein structures in 
a very short time. This review summarizes the findings of functional and structural studies including 
those derived from cryoelectron microscopy, small angle X-ray scattering, NMR spectroscopy, and X- 
ray crystallography, and incorporates bioinformatics predictions where no structural data is available. 
Structures that shed light on the function and biological roles of the proteins in viral replication and 
pathogenesis are highlighted. The high percentage of novel protein folds identified among SARS-CoV 
proteins is discussed. 

© 2013 Published by Elsevier B.V. 


1. Introduction 

In the wake of the SARS crisis, a wave of structural proteomics 
swept the coronavirus research community. The focus of this effort 
was to understand the interplay between structure and function in 
what had been, until that time, a somewhat neglected branch of 
the positive-stranded viruses. The unusual aspect of the SARS pro¬ 
teomics at the time was its evenhandedness - rather than focusing 
exclusively on proteins with well-defined roles in pathogenesis, 
competing international teams attempted to solve structures and 
assign functions across the entire viral proteome. 

This effort brought fresh attention to several little-known 
replicase cofactors, such as the European group’s structure of 
the obscure but important RNA binding protein nsp9 (Egloff 
et al., 2004), the Chinese group’s barrel-shaped 16-protein struc¬ 
ture of nsp7 + 8 primase complex (Zhai et ah, 2005) and the 
American group’s long crawl through the giant multi-domain, 
multi-enzymatic protein nsp3 which found the first of three SARS- 
CoV macrodomain folds (Saikatendu et ah, 2005). 

Shortly after the outbreak, the sequence of the genome was com¬ 
pleted and the 3-D structure of M pro , the main protease essential 
for viral replication, was deposited in the Protein Data Bank (PDB). 
By 2007, 100 entries in the PDB were on 14 of the 28 SARS CoV 
proteins, and at present count there are 99 structures of coronavi- 
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rus M pro available in the PDB alone, providing an unprecedented 
database for investigators working on this and related viruses. This 
review summarizes the findings of functional and structural studies 
including those derived from cryoelectron microscopy, small angle 
X-ray scattering, NMR spectroscopy, and X-ray crystallography in 
an attempt to understand the function and biological roles of the 
proteins in viral replication and pathogenesis. 

2. A note on functional organization 

The new wealth of structural and functional information 
revealed that the coronavirus replicase, which is but one biolog¬ 
ically successful example of the conserved nidovirus replicative 
machinery (Lauber et al., 2013), is not a patchwork amalgam of 
evolutionary jetsam, but an organized piece of biological machin¬ 
ery where proteins are generally organized into units with related 
functions (Fig. 1). The first two parts of the replicase, nspl and 
nsp2 are somewhat enigmatic, but appear to work by interfering 
with host defenses rather than by directly supporting virus replica¬ 
tion. Subunits nsp3-6 contain all the viral factors that are necessary 
to form viral replicative organelles (Angelini et al., 2013), as well 
as two proteinases that are responsible for processing all of the 
viral replicase proteins (Ziebuhr et al., 2000). The small subunits 
nsp7-ll comprise the viral primer-making activities and provide 
other essential support for replication (Donaldson et al., 2007b; 
Imbert et al., 2006; Miknis et al., 2009). The final part of the replicase 
from nspl2-16 contains the remaining RNA-modifying enzymes 
needed for replication, RNA capping and proofreading. 
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Fig. 1. Conservation of the SARS-CoV replicase. Replicase subunits, or domains for nsp3, were color-coded according to percent identity between homologous proteins of 
SARS-CoV and MERS-CoV. Alignments and identity calculations were performed using Clustal Omega (Sievers et al„ 2011). 


The organization of replicase has a sort of chronological logic 
to it. Nspl-2 help to colonize the host, followed by Nsp3-6 which 
lay a foundation to organize and protect the replicative machin¬ 
ery. This is followed by the primer-making activities of nsp7-ll 
which also interact with downstream capping and RNA synthesis 
factors (Bouvet et al., 2010). Finally, in the proper framework, the 
RNA-synthesizing enzymes from the C-terminus of the replicase 
are able to function. While this may be an appealing way to think 
of the replicase, the reality is probably much more complex. The 
replicase proteins are all processed from large polyproteins, and 
therefore are produced at the same time. Because of this, the order 
in which different proteins are active during the viral replication 
cycle remains poorly understood. 

The organization of the replicase also roughly follows a gradient 
of primary sequence conservation. Levels of sequence conserva¬ 
tion among the different coronaviruses are highest at the 3' end of 
the replicase gene, and the sequences are very divergent at the 5' 
end, especially in nspl-3, which are products of nsp3 PL pro cleav¬ 
age. The DMV-making proteins and the primase group of proteins 
show intermediate levels of conservation with the exception of the 
well-conserved nsp5M pro . Fig. 1 illustrates amino acid conserva¬ 
tion across the replicase using the comparison between SARS-CoV 
and MERS-CoV as an example. 

3. The Atlas 

3.3. Nspl 

See Box 1. 

3.3.3. Structure 

Nspl is the N-terminal cleavage product of the replicase 
polyprotein and is produced by the action of PL pro . Nspl is not found 
in the gammacoronavirus or deltacoronavirus lineages, which 
encode a distant homolog of SARS-CoV nsp2 at the N-terminus of 
the replicase po lyprotein. This has led to a suggestion that nspl is 
useful as a group-specific marker (Snijder et al., 2003). SARS-CoV 
nspl is 179 residues long. 

In the alphacoronaviruses, nspl (also known as p9) is a pro¬ 
tein of about 110 residues, with 20-50% sequence identity among 
all alphacoronaviruses. The betacoronaviruses of subgroup A, such 
as murine hepatitis virus (MHV) and human coronavirus OC43, 
encode an nspl protein of about 245 residues, also known as p28 
(Brockway and Denison, 2005). The nspl of SARS-CoV and its bat 


Box 1: Key nspl and nsp2 structures 


Virus 

Protein 

Method 

Accession 

Reference 

SARS-CoV 

nspl 

NMR 

2HSX 

Almeida et al. (2007) 

TGEV 

nspl 

X-ray (1.49 A) 

3ZBD 

Jansson (2013) 

IBV 

nsp2 

X-Ray (2.5 A) 

3LD1 

Yu et al. (2012) 


equivalents, which have been classified as the only members to 
date of the betacoronavirus subgroup B (Gorbalenya et al., 2006; 
Gorbalenya et al., 2004; Snijder et al., 2003), have 179 residues. 
Nspl sequences are divergent between subgroups of betacoron¬ 
avirus, and no sequence similarity between SARS-CoV nspl and 
betacoronavirus subgroup A nspl proteins could be identified using 
standard searching tools such as PS1-BLAST. 

Almeida et al. (2006, 2007) determined the NMR structure of 
the nspl segment from residue 13 to 128 and also showed that the 
polypeptide segments of residues 1-12 and 129-179 are flexibly 
disordered (PDB ID 2GDT; 2HSX) (Almeida et al., 2007). Residues 
13-128 of nspl represents a novel a/(3-fold formed by a mixed 
parallel/antiparallel 6-stranded (3-barrel, an a-helix covering one 
opening of the barrel, and a 3io-helix alongside the barrel (Fig. 2). 
NMR data indicate that full-length nspl has the same globular 
fold as the truncated nspl, but with additional flexibly disordered 
regions that correspond to the N-terminal region (residues 1-12) 
and the long C-terminal tail (residues 129-179). 

The C-terminal portion of SARS-CoV nspl is flexibly disordered. 
Interestingly, it has been determined that the C-terminal half of 
MHV nspl (Lysl24-Leu241) is dispensable for viral replication in 
culture but is important for efficient proteolytic cleavage at the 
nspl-2 peptide linkage by the papain-like protease and optimal 
viral replication (Brockway and Denison, 2005). Likewise, the long 
disordered terminus of SARS-CoV nspl are probably important for 
the efficient proteolytic processing of this protein from the nascent 
viral polyprotein chain. 

The nspl of transmissible gastroenteritis virus (TGEV) was 
recently solved, and was found to contain a similar fold to SARS-CoV 
nspl (Jansson, 2013). This was surprising as there is no detectable 
homology between alphacoronavirus nspl proteins and betacoro¬ 
navirus nspl proteins. However, the relationship of the structures 
suggests that coronavirus nspl proteins share a common evolu¬ 
tionary origin. 

3.3.2. Function 

In several coronaviruses, nspl suppresses host gene expression 
(Huang et al., 2011; Kamitani et al., 2006; Narayanan et al., 2008; 



Fig. 2. Comparison of nspl structure in alpha- and betacoronavirus lineages. The 
SARS-CoV nspl structure comes from PDB entry 2HSX, and the TGEV nspl structure 
comes from PDB entry 3ZBD. 
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Fig. 3. Bioinformatics detection of homology and domain organization in nsp2. The structure of IBV nsp2 is taken from PDB entry 3LD1. Predicted secondary structures for 
representative coronavirus lineages were made by PsiPRED 3.0 (McGuffin et al„ 2000). Rectangles represent a-helices and arrows represent beta-strands. 


Zust et al., 2007a). The mechanism of nspl-mediated host suppres¬ 
sion remains a topic of active research. SARS-CoV nspl binds to 40S 
subunits and exerts suppression of host gene expression (Kamitani 
et al., 2009). It has also been shown that SARS-CoV nspl promotes 
host mRNA degradation but coronavirus RNA species are protected 
from degradation (Kamitani et al., 2009; Tanaka et al., 2012). 

In MHV, nspl arrests the cell cycle of transfected cells in G0/G1 
phase (Chen etal., 2004). MHV mutants that are incapable of liberat¬ 
ing nspl from the nascent polyprotein exhibit delayed replication, 
diminished peak titers, and reduced RNA synthesis compared to 
wild-type controls (Denison et al., 2004). A point mutation in the 
proteolytic cleavage site between nspl and nsp2 in full-length 
TGEV genome blocks the release of nspl from the nascent polypro¬ 
tein and causes a dramatic reduction in virus recovery (Galan et al., 
2005). 

Kamitani et al„ have shown that plasmid-driven expression of 
nspl (driven by SV40, CMV and IFN-(3 promoters) sharply reduces 
protein expression (Kamitani et al., 2006). This correlates with 
reduction in the specific mRNA, whereas rRNA remained unaf¬ 
fected. More generally, transfected nspl mRNA that was capped 
and polyadenylated decreased host protein synthesis, and the 
inclusion of actinomycin D (to block new transcription) showed 
a much stronger inhibition of protein synthesis in the presence of 
nspl, demonstrating that that while translation of new transcripts 
was proceeding (in cells not treated with actinomycin D), transla¬ 
tion from pre-existing transcripts was blocked by nspl. Decreased 
mRNA levels and decreased translation of pre-existing mRNA, pre¬ 
sumably as a result of degradation, were also seen during infection 
with SARS-CoV. 

SARS-CoV nspl has also been shown to be a potent inducer of 
CCL5, CXCL10 and CCL3 expression in human lung epithelial cells 
via the activation of NF-kB (Law et al., 2007). The pathogenesis of 
SARS-CoV infection is characterized by a hyperimmune response 
and the massive elevation of chemokine levels. In contrast, HCoV- 
229E, HCoV-OC43, and MHV did not significantly induce chemokine 
expression, perhaps because these only cause mild upper respira¬ 
tory tract diseases. 

The kinetics of nspl expression suggests that it might have 
an early regulatory role during the viral life cycle. Nspl is the 
first mature protein processed from the gene 1 polyprotein and 
is likely cleaved quickly following translation of PLl pro within nsp3 
(Baker et al., 1989; Baker et al„ 1993; Denison and Perlman, 1987; 
Denison et al„ 1992, 1995; Denison and Perlman, 1986). MHV 
mutants that are incapable of liberating nspl from the nascent 
polyprotein exhibit delayed replication, diminished peak titers, 
small plaques, and reduced RNA synthesis compared to wild-type 
controls (Denison et al., 2004). These results emphasize the impor¬ 
tance of nspl cleavage for optimal viral RNA synthesis and suggest 
that nspl might play an important role at MHV replication com¬ 
plexes. However, later in infection, nspl is distinct from replication 
complexes and instead co-localizes with MHV structural proteins 
at virion assembly sites (Brockway et al., 2004). 

In MHV, nspl interacts with plO and pi5 (counterparts of SARS 
nsp7 and nsplO, respectively; (Brockway et al., 2004)). Previous 
immunolocalization and interaction studies in MHV have also indi¬ 
cated that in vivo, nspl may act in concert with numerous other 


viral proteins - counterparts of SARS nsp2, 5, 8, 9, 12, 13 and 
sars9a (Bost et al., 2000; Brockway et al., 2004). However, at some 
stages in the MHV life cycle, nspl has spatially different membrane 
localization from p65, the SARS nsp2 counterpart. It appears that 
later in infection, MHV nspl co-localizes with structural proteins 
at virion assembly sites rather than with the replication complexes 
(Brockway et al., 2004). Y2H and co-immunoprecipitation studies 
indicate that nspl interacts with E, and sars3a (von Brunn et al., 
2007). 

3.2. Nsp2 

3.2A. Structure 

SARS-CoV nsp2 is a counterpart of the p65 protein (Denison 
et al., 1995) of MHV. As with nspl, sequence homology is very low 
and does not permit confident sequence alignment. However, as 
with nspl, bioinformatics gives an indication that coronavirus nsp2 
proteins share a common fold and origin (Fig. 3). The structure of 
the N-terminal 359 amino acids of the IBV equivalent of nsp2 has 
been released, but is currently awaiting full publication, though a 
crystallization report is available (Yang et al., 2009; Yu et al., 2012). 
Crystallization of part of the SARS-CoV nsp2 (Li et al., 2011 ) has also 
been reported, but the structure is not currently available pending 
full publication. The solved region of IBV nsp2 comprises about half 
the protein. It represents a novel multi-domain fold, though further 
structural and functional details await full publication. 

Interestingly, secondary structure prediction suggests that coro¬ 
navirus nsp2 proteins consist of a duplicated fold, with a second, 
more conserved fold similar to the structure 3LD1 immediately fol¬ 
lowing the region solved in 3LD1 (Fig. 3). This is seen most clearly 
in the gammacoronaviruses and deltacoronaviruses. This would fit 
the context of domain and fold duplication at the N-terminal part of 
the replicase polyprotein which has been observed across the coro- 
naviridae, which includes duplicated ubiquitin-like, papain-like, 
and macrodomain folds (Neuman et al., 2008). 

3.2.2. Function 

The functions of nsp2 remain unknown. In MHV p65 has spa¬ 
tially different membrane localization from nspl and co-localizes 
with the MHV homologue of SARS nsp8 (Sims et al., 2000). In MHV, 
p65 plays an important role in the viral life cycle (Hughes et al., 
1993) that appears to be distinct from that of its counterparts in 
other coronaviruses (Bost et al., 2001; Denison et al., 2004; Sims 
et al., 2000). Based on immunolocalization studies in MHV, p65 
may function in concert with counterparts of SARS nspl, 5, 7, 8, 9, 
10,12,13 and sars9a(Bost etal., 2000; Brockway et al., 2004). Dele¬ 
tion mutagenesis with infectious clones of SARS and MHV indicated 
that nsp2 is dispensable for viral replication in cell culture; how¬ 
ever, deletion of the nsp2 coding sequence attenuates viral growth 
and RNA synthesis. 

The exact nature of the role of nsp2 in viral growth and RNA syn¬ 
thesis is still not clear. However, IBV nsp2 has a weak PI<R antagonist 
activity, which may hint at a role complementary to that of nspl in 
interfering with intracellular immunity. A proteomics study with 
full-length SARS-CoV nsp2 also found that nsp2 bound prohibitin 
1 and prohibitin 2, which could contribute to the hypothetical role 
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Fig. 4. Domain architecture and structure of the conserved DMV-making proteins nsp3 through nsp6. A partially schematic view combining the known domain structures 
of SARS-CoV nsp3 is shown in panel A, combining the PDB entries 21DY, 2ACF, 2FE8, 2KAF, 2JZF, 2W2G and 2K87. Conservation of structural domains and other important 
sequence features is shown for four representative coronaviruses in panel B. Structures of the C-terminal domain of FCoV nsp4 (3GZF), SARS-CoV nsp5 (1UJ1), TGEV nsp5 
(1LVO) and IBV nsp5 (2Q6F) are shown in panels C-F, respectively. 


of nsp2 in counteracting intracellular immunity (Cornillez-Ty et al., 
2009). Based on immunolocalization studies in MHV, p65 may func¬ 
tion in concert with counterparts of SARS nspl, 5, 7, 8, 9,10,12,13 
and sars9a (Bost et al., 2000; Brockway et al., 2004). 

3.3. Nsp3 

3.3.1. Structure 

SARS nsp3 is a large multidomain protein with 1922 residues 
(Snijder et al., 2003; Thiel et al., 2003). Nsp3, with nsp4, nsp5 and 
nsp6 forms a conserved block of proteins that are involved in form¬ 
ing the double-membrane vesicles that are the site of viral RNA 
synthesis (Fig. 4). Every completely sequenced coronavirus has an 
nsp3-related protein. All nsp3s are ~2001<Da, cleaved from the 
polyprotein la or lab by PL pro . 

We have compiled a higher-resolution analysis of nsp3 domain 
architecture as a tool for novel structural and functional charac¬ 
terization (Neuman et al., 2008). Based on phylogenetic analysis 
of coronavirus and torovirus nsp3 homologues, results from pre¬ 
viously published studies (Gorbalenya et al., 2006; Ratia et al„ 
2006; Saikatendu et al., 2005; Serrano et al., 2007; Thiel et al„ 
2003; Ziebuhr et al., 2001) and de novo domain prediction soft¬ 
ware (Jaroszewski et al., 2005 ), we estimate that SARS-CoV nsp3 has 
about 14 domains - UB1, AC (it is missing PLl pro found in several 
other CoVs), ADRP, SUD-N, SUD-M, SUDC, UB2, PL2 pro , NAB, G2M, 
TM1, ZF, TM2, and Y, which may contain three structural domains. 
A partially schematic model of the nsp3 structure is shown (Fig. 3). 
Inferring from the presence of PL2 pro cleavage sites at both termini 
of nsp3, the observed glycosylation at positions 1431 and 1434 in 
the ZF domain of SARS-CoV (Harcourt et al., 2004) and the homol¬ 
ogous region of MHV (Kanjanahaluethai et al., 2007). SARS-CoV 
nsp3 contains two transmembrane spans, placing the first 1395 
residues (including the PL2 pro domain), and the last 377 residues 
(the Y domain) on the same face of the membrane (Oostra et al., 


2008). The two TM helices probably consist of residues 1396-1418 
and 1523-1545. This transmembrane topology is similar to that 
proposed for MHV nsp3 (Kanjanahaluethai et al., 2007). Between 
helices two and three, there is a central, absolutely conserved tetrad 
of cysteines (CX! 4 _igC 4 _ 5 C 2 C) - which may represent a Zn finger - 
which is likely on the same side of the membrane as the domains 
N- and C-terminal to the TM region. 

3.3.2. Function 

Although the function of the N-terminal region of polyprotein 
la/polyprotein lab is not known, both the transcription-negative 
phenotypeof an alphavirus X domain mutant (von Brunn et al., 
2007) and the conservation of a transcription factor-like zinc fin¬ 
ger in coronavirus PL pro domains (Culver etal., 1993) indicated that 
nsp3 might be involved in coronavirus RNA synthesis. This hypoth¬ 
esis is strongly supported by a report in which the equine arteritis 
virus nonstructural protein 1, which, most probably, is a distant 
homolog of the coronavirus PL pro , is shown to be a transcriptional 
factor that is indispensable for sg mRNA synthesis (Phizicky and 
Greer, 1993). 

3.4. UB1 and AC 

3.4.1. Structure 

The sequence of the N-terminal domain of nsp3 (1 -183) is highly 
conserved in different SARS coronavirus isolates but shows less 
than 25% of sequence identity with other known proteins. This 
region exhibits two well defined regions with different physico¬ 
chemical and structural properties. NMR was used to determine the 
structure of the N-terminal domain (residues 1-110); this exhibits 
a ubiquitin-like fold with two additional helices which make the 
overall structure of this domain (UB1 domain) more elongated than 
other ubiquitin-like proteins (Serrano et al., 2007). NMR studies 
revealed that the highly acidic (51% Glu/Asp residues) C-terminal 
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domain (residues 111-183; AC domain) is structurally disordered 
(Serrano et al., 2007). 

3.4.2. Function 

UB1 has high structural homology to Ras-interacting proteins 
such as the Ras-interacting domain (RID) of RALGDS, a member 
of the RA family and conservation of residues important for the 
interaction with Ras. Ras family proteins (RFPs) act as molecular 
switches that cycle between inactive GDP- and active GTP-bound 
states. RFPs control cell growth, motility, intracellular transport 
and differentiation. Ras plays a fundamental role in cell progres¬ 
sion from phase Go to Gt (Dobrowolski et al„ 1994; Peeper et al„ 
1997) Molecular interactions that result in a Ras inactivation avoid 
cell progression to G1 phase. SARS-CoV and other coronaviruses 
such as MHV are able to induce cell arrest in G 0 /G ^ phase during 
the lytic infection cycles for their own replication advantage (Chen 
and Makino, 2004; Yuan et al„ 2005). Sars3b plays a role in this pro¬ 
cess (Yuan et al„ 2005) and nsp3 may also be involved in arresting 
the cell cycle arrest in the G 0 phase. 

Additionally, UB1 has structural homology with ISG15, an 
interferon-induced protein constitutively present in higher eukary¬ 
otes. This protein conjugates with cellular targets as a primary 
response to interferon-a/(3 induction and other markers of viral or 
parasitic infection. High levels of this protein are essential for cellu¬ 
lar antiviral response. It is known that ISG15 is able to inhibit virus 
replication by abrogating nuclear processing of unspliced viral RNA 
precursors. However, some viruses have developed a mechanism to 
avoid the expression of ISG15. For example, influenza B virus blocks 
its expression by means of NS1 protein in order to overcome the 
immune response. It is possible that the PL2 pro domain of nsp3 may 
bind ISG15 and subvert the antiviral response of the cell. 

The functional significance of RNA-binding by this domain is 
unknown. It possesses a ubiquitin fold, as does the domain N- 
terminal to the PL pro domain. We also do not know if this fact 
has any functional significance. Also, the predicted function of this 
domain based on its similarity to Ras-binding proteins and ISG15 
remains to be experimentally validated. 

NMR experiments indicated a ligand bound to UB1, which was 
identified as a small RNA fragment by mass spectrometry. NMR 
studies have identified the interacting molecular interfaces. UB1 
of MHV has recently been shown to interact with the nucleopro- 
tein, effectively tethering nsp3 to viral RNA during the replication 
process (Hurst et al., 2013). This activity does not require the AC 
domain that follows UB1 and is hypervariable. 

3.5. ADRP/Macrol 

3.5.1. Structure 

The crystal structure of a construct consisting of residues 
184-365 has been determined for SARS-CoV (Saikatendu et al., 
2005), and the corresponding region has since been solved for sev¬ 
eral other coronaviruses (see Box 2). This region of nsp3 adopts 
a macro H2A domain fold. The putative active site and substrate¬ 
binding residues were conserved in its three structural homologues 
yeast Ymx7, Archaeoglobus fulgidus AF1521 and Er58 from E. coli, 
and its sequence homologue, yeast YBR022W, a known phos¬ 
phatase that acts on ADP ribose-1 ’’-phosphate (Appr-1 ”-p or ADRP). 
The notable exception is that proposed active site residue Asp90 in 
YMX7 is an alanine in both the SARS-CoV ADRP (Ala50) and AF1521 
(Ala44). Histidine residues in both enzymes proximal to the ter¬ 
minal 1" phosphate of the substrate (His45 in ADRP and His39 
in AF1521) might therefore be involved in catalysis (Saikatendu 
et al., 2005). Alternatively, the predominant nucleophile in the cat¬ 
alytic site may actually be an Asp or Glu in the conformationally 
flexible loop unNAGEDIQiov in SARS-CoV and the corresponding 
region in other coronaviral ADRPs (Saikatendu et al., 2005). The 


Box 2: Key nsp3 and nsp4 structures 

Virus 

Domain 

Method 

Accession 

Reference 

SARS-CoV 

UB1and Ac 

NMR 

21DY 

Serrano et al. (2007) 

HCoV-229E 

ADRP 

X-ray (2.0 A) 

3EWR 

Xu et al. (2009) 

FCoV 

ADRP 

X-ray (3.9 A) 

3JZT 

Wojdyla et al. (2009) 

IBV 

ADRP 

X-ray (2.0 A) 

3EWP 

Xu et al. (2009) 

HCoV-NL63 

ADRP 

X-ray (1.9 A) 

2VRI 

Awaiting publication 

SARS-CoV 

ADRP 

X-ray (1.4 A) 

2ACF 

Saikatendu et al. (2005) 

TGEV 

p|_1 Pro 

X-ray (2.5 A) 

3MP2 

Wojdyla et al. (2010) 

SARS-CoV 

UB2-PL2 pro 

X-ray (1.9 A) 

2FE8 

Ratia et al. (2006) 

SARS-CoV 

SUD-N-M 

X-ray (2.2 A) 

2W2G 

Tan et al. (2009) 

SARS-CoV 

SUD-M 

NMR 

2JZF 

Chatterjee et al. (2009) 

SARS-CoV 

SUD-C 

NMR 

2KAF 

Johnson et al. (2010) 

SARS-CoV 

NAB 

NMR 

2K87 

Serrano et al. (2009) 

FCoV 

nsp4-CTD 

X-ray (2.8 A) 

3GZF 

Manolaridis et al. (2009) 



former proposition was verified by site directed mutagenesis data 
in the HCoV-229E ADRP, which showed that residues Asn37, Asn40, 
His45, Gly44 and Gly48 are part of the active site in the SARS-CoV 
ADRP (Putics et al., 2005). 

3.5.2. Function 

The SARS ADRP readily hydrolyzes the 1” phosphate group 
from Appr-l”-p in vitro demonstrating that it is an active enzyme 
(Saikatendu et al., 2005). Another group validated this finding; both 
the SARS ADRP and the human coronavirus HCoV-229E counter¬ 
part were shown to dephosphorylate Appr-l”-p to ADP-ribose in a 
highly specific manner, the enzyme having no detectable activity 
on several other nucleoside phosphates (Putics et al., 2005). 

The role of an ADRP in the coronavirus life cycle may closely 
parallel that in the eukaryotic tRNA splicing pathway (Culver et al., 
1993; Phizicky and Greer, 1993; Saikatendu et al„ 2005; Snijder 
et al., 2003). In coronaviruses, an early post infection event is 
the transcription of a nested set of sub-genomic mRNAs. Each 
sub-genomic mRNA contains a short 5'-terminal ‘leader’ sequence 
derived from the 5' end of the genome (Lai and Holmes, 2001a,b; 
Thiel et al 2003). The fusion of the two noncontiguous RNA seg¬ 
ments is a poorly understood process. It is thought to be achieved 
by a discontinuous step in the synthesis of the minus-strand and 
involves transcription regulatory sequences (Pasternak et al., 2001; 
Thiel et al., 2003). In eukaryotes, pre-tRNA splicing is initiated 
by cleavage at the splice site by an endonuclease. The result¬ 
ing tRNA fragments are then ligated to yield mature tRNA that 
retains the 2' phosphomonoester group at the splice site (Phizicky 
and Greer, 1993). Using NAD as an acceptor, a phosphotrans¬ 
ferase removes the 2' phosphate to yield ADP-ribose-l"-2" cyclic 
phosphate (Culver et al., 1994). A cyclic phosphodiesterase then 
hydrolyzes Appr>p to yield Appr-1 "-p (Culver etal., 1994) (Martzen 
et al., 1999). Finally, a phosphatase converts Appr-l"-p into ADP- 
ribose and releasing inorganic phosphate. While the equivalent for 
the cyclic phosphodiesterase appears absent in the SARS proteome, 
the Appr-l"-p phosphatase (SARS ADRP) and an endonuclease 
(nspl5) are present. Characterization of an Appr-l"-phosphatase- 
deficient HCoV-229E mutant revealed no significant effects on viral 
RNA synthesis and virus titer (Putics et al., 2005). 

Egloff et al. (2006) suggested that ADRP may primarily be a poly- 
ADP-ribose binding (PAR-binding) module. PARylation occurs in 
compromised cells to trigger apoptosis. PAR polymerases (PARPs) 
are responsible for so tagging proteins. PARP is activated on rec¬ 
ognizing nicked DNA, and it helps in DNA repair. It auto-PARylates 
itself, and in case of extreme DNA damage, gets overactivated and 
depletes the cell of its nucleotide pool. If ADRP binds PAR, then it 
can bind proteins that are PARylated, including PARP. Indeed, bind¬ 
ing the latter may be most beneficial, since it can tether down this 
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protein, slow down apoptosis, and prevent nucleotide depletion, 
prolonging viral replication and transcription in the infected cell. 

The existence of the ADRP domain in all CoV nsp3s (as well as in 
several other viruses) argues for its critical role in the viral life cycle. 
Its function as an ADRP in recycling organic phosphate appears to 
be a dispensable function and does not correlate with the conser¬ 
vation of this domain. It appears that its role as an ADRP may be 
secondary to a more important role - such as its proposed role as a 
PAR-binding module. If so, the role of PAR-binding in the viral life 
cycle needs to be delineated experimentally. 

3.6. SARS-unique domain 

3.6.1. Structure 

The region corresponding to residues 366 to 722 has been con¬ 
sidered a domain unique to SARS-CoV and is called the SUD (SARS 
Unique Domain). The corresponding regions which are located just 
downstream of the ADRP domain in HCoV-NL63, PEDV, HCoV- 
229E (among alphacoronaviruses) and all Betacoronaviruses have 
no assigned domain prediction, but secondary structure predic¬ 
tion suggests the presence of an additional macrodomain fold 
(Chatterjee et al„ 2009). 

It has been demonstrated that the SUD may actually consist 
of three domains, termed by position: the N-terminal SUD-N, the 
middle SUD-M and the C-terminal SUD-C Deuterium exchange 
mass spectrometry data on a construct nsp3:451-651 initially 
appeared to support this notion, indicating that a well ordered 
domain exists from residues 523-651. Constructs representing 
nsp3:365-722 have been shown to be particularly susceptible to 
proteolysis (Stefanie Tech et al„ 2004). Size exclusion chromatog¬ 
raphy and PFO-PAGE indicates that it forms a dimer in solution and 
1D-NMR spectra show that is well-folded. 

The structure of SUD-N and SUD-M has been solved and sur¬ 
prisingly each domain was found to contain a macrodomain fold 
that was a close structural match for the SARS-CoV ADRP/Macrol 
domain despite a lack of detectable amino acid homology between 
these proteins (Tan et al., 2009). The presence of these additional 
macrodomain folds has also been confirmed by the NMR structure 
of the complete SUD (Johnson et al„ 2010) and the NMR structure 
of SUD-M (Chatterjee et al„ 2009). The SUD-C domain contained 
a novel fold that consisted of an antiparallel beta sheet (Johnson 
et al„ 2010). 

3.6.2. Function 

All three of the domains that make up the SUD have been 
demonstrated to interact with nucleic acid in some way. The SUD- 
NM has a high affinity for G-rich sequences and G-quadruplexes 
(Tan et al„ 2009), while the SUD-MC showed a general preference 
for purine nucleotides (Johnson et al., 2010). While the SUD-N and 
SUD-M domains bear a close structural resemblance to the SARS- 
CoV ADRP domain, neither domain has any demonstrable affinity 
for ADP-ribose (Tan et al., 2009). The amino acids responsible for 
SUD-M and SUD-C RNA binding have been mapped, and appear to 
fall near the region of SUD-M that corresponds to the active site 
in the structurally similar ADRP domain (Chatterjee et al., 2009). 
Together this suggests that the cluster of three macrodomains in 
SARS-CoV nsp3 arose through gene duplication and that the SUD 
may contribute to the function of nsp3 as an accessory to the viral 
replication process (Neuman et al., 2008). 

3.7. UB2 and PL2 pro 
3.7.1. Structure 

Unlike many coronaviruses that encode two papain-like pro¬ 
tease, SARS-CoV has a single copy of papain-like cysteine protei ase 
(PL2 pro ) that cleaves polyprotein la at three sites at the N-terminus 


( 177 LNGG;AVT 18 3 , gi5LKGGj,APl82i, and 2 7 3 7 LKGG|KIV2 7 43) to 
release nspl, nsp2, and nsp3, respectively (Harcourt et al., 2004; 
Thiel et al., 2003). SARS-CoV PL2 pro is also a deubiquitinat- 
ing enzyme; it efficiently disassembles diubiquitin and branched 
polyubiquitin chains, cleaves ubiquitin-AMC substrates, and has 
de-ISGylating activity (Chen et al., 2007b; Lindner et al., 2005). 
Thus, PL2 pro may have critical roles not only in proteolytic 
processing of the replicase complex but also in subverting cellular 
ubiquitination machinery to facilitate viral replication. Thestruc- 
ture of a PL2 pro construct nsp3:723-1037 revealed a ubiquitin fold 
(residues 723-783; UB2) and a well-ordered papain-like protease 
catalytic domain (residues 784-1036; PL2 pro ) (Ratia et al., 2006). 
The catalytic domain adopts the canonical “thumb, palm and fin¬ 
gers” domain architecture. The thumb domain is formed by four 
prominent helices, the palm is made up of a six-stranded (3-sheet 
that slopes into the active site, which is housed in a solvent-exposed 
cleft between the thumb and palm domains, and a four-stranded, 
twisted, anti-parallel (3-sheet makes up the “fingers” domain. Two 
(3-hairpins at the fingertips region contain four cysteine residues, 
which coordinate a zinc ion with tetrahedral geometry. Mutational 
analysis of the zinc-coordinating cysteines of SARS-CoV PL pro , that 
zinc-binding ability is essential for structural integrity and protease 
activity (Barretto et al., 2005). PL2 pro has several structural homo- 
logues from the cysteine protease superfamily, the most significant 
being USP14 and HAUSP, both of which are cellular DUBs. The 
active site of PL pro consists of a catalytic triad of cysteine, histidine, 
and aspartic acid residues, consistent with catalytic triads found 
in many PL pro domains. The recent structure of the TGEV PLl pro 
demonstrates that the coronavirus-like PL pro folds have a common 
architecture (Wojdyla et al., 2010), and likely arose through gene 
duplication. 

3.7.2. Function 

It has been demonstrated that an LXGG motif at the P4-P1 pos¬ 
itions of the substrate is essential for recognition and cleavage by 
PL2 pro (Barretto et al., 2005; Han et al., 2005). There appear to be 
no preferences for the P' positions or for residues N-terminal to P4. 
It is not surprising then that PL pro is able to cleave after the four 
C-terminal residues of ubiquitin, LRGG. As predicted by Sulea et al. 
(2005) SARS-CoV PL2 pro (nsp3 residues 1507-1858) does possess 
de-ubiquitinating activity (Barretto et al., 2006; Lindner et al., 
2005) in addition to its better-known cysteine protease activity. 
The specific deubiquitinating enzyme inhibitor, ubiquitin aldehyde, 
inhibited its activity at a K, of 210 nM. 

Interestingly, a number of cellular deubiquitinases, including 
full-length USP14 and Ubp6, possess an N-terminal ubiquitin-like 
domain. Although the significance of this domain in these pro¬ 
teins is not well established, it has been demonstrated that the 
presence of the ubiquitin-like domain in USP14 and Ubp6 serves a 
regulatory function by mediating interactions between these deu¬ 
biquitinases and specific components of the proteasome (Hu et al., 
2005; Leggett et al., 2002). Comparisons of deubiquitinase activities 
between wild-type and mutant Ubp6 lacking the Ubl domain reveal 
that these associations are responsible for a 300-fold increase in 
catalytic rate and serve to activate the enzyme (Leggett et al., 2002). 
It is intriguing to consider that the Ubl-like domain of PL pro may 
instead act as a sort of “decoy” or “lure” to detract cellular ubiq- 
uitinating enzymes from other viral proteins, or it may mediate 
protein-protein interactions between the replicase components. 

While the role of PL2 pro in polyprotein processing is well 
understood, the physiological significance of its deubiquitinating 
activity in the viral replication cycle is still not completely clear. 
However the conserved structural protein E is readily ubiquitin- 
ated in infected cells, suggesting that deubiquitination may be 
important in the assembly process (Alvarez et al., 2010). There 
is now mounting evidence that PL2 pro interferes with interferon 
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transcriptional activation pathways by inactivating TBK1, blocking 
NF-kappaB signaling and preventing translocation of IRF3 to the 
nucleus (Frieman et al., 2009; Wang et al„ 2011; Zheng et ah, 2008). 

3.8. NAB and GSM 

3.8.1. Structure 

The region between the PL2 pro domain and the transmembrane 
region of nsp3 does not show sequence similarity to any known 
domain. Disorder prediction programs Disembl 1.4, Foldlndex and 
RONN predict a long disordered stretch of residues in the cen¬ 
tral region of the segment in SARS-CoV, suggesting that it may 
be a consist of two domains with the second domain beginning 
from around residue 1226. A NMR structural study confirmed that 
the NAB is an independently folded functionally active unit. The 
solved region comprised residues 1066 to 1181 and the constructs 
nsp3(1066-1203) and nsp3(1035-1181). This globular domain 
represents a new fold, with a parallel four-strand (3-sheet holding 
two a-helices of three and four turns that are oriented antiparal¬ 
lel to the beta-strands. Two antiparallel two-strand (3-sheets and 
two 3 1( )-helices are anchored against the surface of this barrel-like 
molecular core. A positively charged patch on the molecule sur¬ 
face was identified by NMR ascontaining the nucleic acid binding 
activity. 

3.8.2. Function 

The NAB has been demonstrated to form homodimers upon 
incubation at 37 °C (Neuman et al., 2008), and displayed a high 
affinity for nucleic acid. While the NAB was able to interact with 
both single-stranded and double-stranded nucleic acids, cooling 
the protein-nucleic acid complex released single-stranded RNA, 
demonstrating that the NAB may function as a ssRNA binding pro¬ 
tein with RNA chaperone-like activity (Neuman et al., 2008). Little 
else is known about the function of NAB in the viral replication 
cycle, or about the structure and function of the GSM domain that 
follows or the conserved hydrophobic, non-transmembrane region 
that immediately precedes the first transmembrane region of nsp3. 

3.9. TM, ZF and Y 

3.9.1. Structure 

The region of nsp3 after the TM domain is highly conserved in 
ail CoVs, but this region has not been structurally characterized yet. 
An Fold and Function Annotation System search (FFAS; Jaroszewski 
et al., 2005) using the sequence from the SARS-CoV RBD to the 
end of nsp3 domain, TM domain and Y domain reveals three of 
seven significant hits (with expect values of -8 or better) to viral 
RdRp proteins, which may hint at the evolutionary origin of nsp3, 
which comprises nearly one fifth of most coronavirus genomes. 
The level of conservation in the Y domain in particular approaches 
levels consistent with the other enzymatic domains of nsp3, and 
exceeds the conservation of other domains that are believed to be 
non-enzymatic (Neuman et al., 2008). 

3.9.2. Function 

It appears that domains from PL2 pro to the Y domain have not 
undergone significant deletion or rearrangement during coronavi¬ 
rus evolution, while other nsps like nspl, nsp2, and the N-terminal 
regions of nsp3 clearly have evolved by duplication and deletion of 
domains (Neuman et al., 2008). Therefore nsp3 is more likely to con¬ 
fer a basic and important function in a variety of hosts. UB1, SUD and 
RBD bind RNA, and ADRP is part of the RNA-processing machinery. If 
not for the proteinase(s), nsp3 would be classified exclusively as an 
RNA binding/modifying protein. These regions have been shown to 
change the localization of nsp4 (Hagemeijer et al., 2011 ), and cause 
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a membrane proliferation phenotype in transfected cells (Angelini 
etal., 2013). 

The topology of nsp3 leaves only one domain, ZF, on the lumenal 
side of the membrane. If nsp3 participates directly in the membrane 
pairing exhibited in cells transfected with SARS-CoV nsp3 and nsp4 
(Angelini et al., 2013), then the ZF domain likely participates in this 
interaction. 

3.10. Nsp4 

3.10.1. Structure 

Nsp4 is a transmembrane protein with four transmembrane 
helices and an internal C-terminal domain (Oostra et al., 2007). 
Coronavirus nsp4 is approximately 500 amino acids in length, 
and is the only part of the viral polyprotein that is released after 
processing by both the PL pro and M pro . The location and topology 
of the four transmembrane regions has been mapped (Oostra et al., 
2007). 

The C-terminal domain of nsp4 is conserved in all known coro- 
naviruses, but deletion of this domain from a MHV infectious clone 
resulted in only slightly attenuated virus growth, consistent with 
a non-essential function virus (Sparks et al., 2007). Mutation of 
the two glycosylation sites in nsp4, however, led to defective DMV 
formation and attenuation (Gadlage et al., 2010). The structure of 
the C-terminal domain of FCoV nsp4 has been reported (Fig. 4). 
It consists of two small antiparallel (3-sheets and four a-helices 
(Manolaridis et al., 2009). 

3.10.2. Function 

SARS-CoV Nsp4 is an essential component for the formation of 
viral double-membrane vesicles (Angelini et al., 2013). Intracellu¬ 
lar expression studies have demonstrated a biological interaction 
between the carboxyl-terminal region of MHV nsp3 (Hagemeijer 
et al., 2011), and co-expression of full-length SARS-CoV nsp3 and 
nsp4 results in extensive membrane pairing, in which the paired 
membranes are held at the same distance as observed in authentic 
DMVs (Angelini et al., 2013). Nsp4 has also been shown to inter¬ 
act with nsp2 in a yeast two-hybrid screen (von Brunn et al., 2007), 
and to interact with other nsp4 molecules in cells (Hagemeijer et al., 
2011). Nsp4 has been shown to cause aberrant DMV formation upon 
mutation, leading to a loss of nsp4 glycosylation (Gadlage et al., 
2010; Sparks et al., 2007) 


3.11.1. Structure 

M pro , a chymotrypsin like protease is encoded within the mature 
polypeptide nsp5. It emerges by self trans-cleavage at nsp4/5 and 
5/6 boundaries at residues 3238 VLQISGF 3243 and 3544 TFQIGKF 3549 
of polyprotein polyprotein la/lab. It belongs to the C30 family 
of endopeptidases and is responsible for cleavage at 11 sequence 
specific sites within polyprotein la/lab. The resultant “mature” 
protein products (nsp4-16) assemble into components of the repli¬ 
cation complexes. Given its paramount importance in replicase 
processing and therefore its role in viral replication, this protein 
has been extensively studied both from structural and functional 
perspectives (reviewed in Hilgenfeld et al., 2006; Ziebuhr et al., 
2000 ). 

Based on both structure and sequence characteristics, nsp5 can 
be divided into three domains. This domain prediction has been 
confirmed by the numerous crystal structures. It is conserved in all 
coronaviruses, indeed in all three nidoviral groups and several other 
RNA viruses that share common polyprotein processing scheme 
(Ziebuhr et al., 2000). The sequence is related to chymotrypsin-like 
protease superfamily of endopeptidases. 


3.11. Nsp5 
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Box 3: Key nsp5 structures 


Virus 

Domain 

Method 

Accession 

Reference 

HCoV-229E 

M pr0 

X-ray (2.5 A) 

1P9S 

Anand et al. (2003) 

HCoV-HKUl 

M pr0 

X-ray (2.5A) 

3D23 

Zhao etal. (2008) 

HCoV-HKU4 

M pro 

X-ray ( 1.6 A) 

2YNB 

Awaiting publication 

IBV 

M pr0 

X-ray ( 2.0 A) 

2Q6F 

Xue et al. (2008) 

HCoV-NL63 

M pr0 

X-ray (1.6A) 

3TLO 

Awaiting publication 

SARS-CoV 

M pro 

X-ray (1.9 A) 

1UJ1 

Yang et al. (2003) 

TGEV 

M pro 

X-ray ( 2.0 A) 

1LVO 

Anand et al. (2002) 


Structures of M pro from five different coronaviruses have been 
reported (Box 3; Fig. 4), and structures for HCoV-NL63 and HKU4 
viruses have been released prior to publication. All these structures 
show stringent conservation of a three-domain tertiary architec¬ 
ture and a partially surface exposed catalytic core. 

The first two domains (N terminal 8-101 amino acids of domain 
1 and 102-184 form domain 2; (Yang et al., 2003)) are duplicated 
closed (3-barrels of type with n = 6 and S = 8 (Murzin et al., 1995) in 
which the strands are arranged in a greek key motif - all hallmarks 
of trypsin-like serine protease fold as defined in SCOP structure 
classification database and has been placed under “viral cysteine 
protease of the trypsin fold" family. Close homologs of this family 
include picornvirus-like 3C cysteine proteases. 

The critical role of the first seven residues at the N terminus 
in dimerization and its close proximity to the active site results in 
this enzyme to be an obligate dimer, although modification of the 
termini appears to modulate higher order oligomerization (Zhang 
et al., 2010). Deletion of the first five amino acids results in com¬ 
plete inactivation of this enzyme. The helical C-terminal domain 
III mediates homodimerization of coronaviral M pro proteases. This 
interaction is believed to be important for its trans-proteolytic 
activity. The active site is located at the interface of the two 13- 
barrels with the catalytic residues H41 and C144 being contributed 
by domains 1 and 2 respectively. 

The active site is located in a substantially solvent exposed 
cleft that is located between the two (3-barrels. Structures of M pro 
complexed with peptide/peptidomimetic substrates reveal that the 
substrate peptides occupy the SI and S2 pockets in an anti-parallel 
(3-sheet orientation to the two interacting (3-strands of the enzyme 
active site - a feature seen in subtilisin and related serine pro¬ 
teases. Sequence specificity is mainly determined by the SI binding 
pocket. All coronaviral M pro recognize a glutamine as the PI residue, 
a feature that is largely determined because of structural compli- 
mentarity. The wall of PI pocket is lined by residues His 163, Phe 
140 (that contribute sidechains) and Met-165, Glu-166, and His- 
172 contributing main chain atoms (Anand et al., 2003; Yang et al., 
2003). Comparison of ligand bound and apo structures have shown 
that unlike the SI pocket (which houses the PI residue) that largely 
remains unchanged, the S2 pocket undergoes significant conforma¬ 
tional changes upon ligand binding. The specificity for leucine being 
the most common residue at P2 position (see Table 1) was struc¬ 
turally explained by a ligand fit-induced structural ordering of the 
S2 pocket by Yin and coworkers (Yin et al., 2007). 

3.31.2. Function 

The most well-studied of all SARS proteins, nsp5, also known as 
the main protease (M pro ) or in older literature as the chymotrypsin- 
3C like proteast (3CL pro ) is the primary molecule responsible for 
cleaving and maturation of SARS polyprotein Polyprotein la and 
Polyprotein lab. As part of its proteolytic activity, it is destined 
to interact with all the non-structural proteins from nsp4 to 16, 
presumably near its catalytic site. 

It cleaves the polyproteins at 11 sequence specific cleavage sites 
(Table 1). The other three (nspl/2, 2/3 and 3/4) are cleaved by the 


PL pro . It is specific for a glutamine followed by a small hydrophobic 
residue, usually an alanine or a glycine, sometimes a serine. 

Hsu and others (Hsu et al., 2005) have described a four step pro¬ 
cess by which a mature catalytically competent M pro head-to-head 
dimer is formed from multiple copies of polyprotein Polyprotein 1 a. 
(Lin et al., 2004) have dissected its trans cleavage (by ELISA based 
assays) and cis peptide cleavage activities by cell based assays. 

Unlike the canonical chymotrypsin-like proteases, M pro houses 
a conserved catalytic dyad-residues His 41 and Cys 45 and not the 
more common catalytic triad of cysteine proteases. Instead, from 
the structures, it is apparent that the role of the third catalytic 
residue has been taken over by a conserved water molecule that 
often lies within hydrogen bonding distance of H41. The role of a 
conserved “catalytic” water molecule has long been recognized in 
many proteolytic cleavage schemes, especially in serine proteases 
(Perona et al., 1993). The mechanism of substrate hydrolysis is how¬ 
ever similar to its cysteine protease cousins, in which the acylation 
(the first step) is performed by His 41, which acts as a general 
base and helps the sulfur atom of the catalytic cysteine residue’s 
sidechain to carry out the nucleophilic attack on the backbone C=0 
group of the peptide bond to be cleaved (Yin et al., 2007). The first 
transition state is a tetrahedral intermediate (Tl-1). The next step 
of the cleavage cycle is the implosion (collapse) of this transition 
state and leaving of the C-terminal half of the peptide product from 
the active site. At this stage, the other portion of the peptide sub¬ 
strate is covalently bound to the enzyme via a thio-ester linkage. 
In the other half of the reaction cycle (the de-acylation step), a 
water molecule that is activated by His41 acts as a nucleophilic 
hydroxyl ion (OH~), attacks the carbonyl atom of the thioester and 
releases the N-terminal half of the peptide product, thus regenera¬ 
ting the cysteine. Many excellent and exhaustive studies have based 
this mechanism of catalysis and the structure of the catalytic site 
architecture to design several different classes of peptidomimetic 
inhibitors targeted against M pro of coronaviruses (including SARS) 
and other pathogenic viruses. Several M pro inhibitors have also 
been structurally characterized (Akaji et al., 2011; Bacha et al., 
2008; Chu etal.,2006; Chuck etal.,2013; Grum-Tokars et al.,2007; 
Lee et al., 2007; Lee et al., 2009; Lee et al., 2005; Shan and Xu, 2005; 
Shao et al„ 2007; Turlington et al„ 2013; Verschueren et al., 2008; 
Wei et al., 2006; Yang et al., 2006; Yang et al., 2003; Yang et al., 
2007; Zhang et al., 2010; Zhu et al., 2011). 

3.32. Nsp6 

3.32.3. Structure and function 

The membrane topology of nsp 6 has been determined (Oostra 
et al., 2008). Although SARS-CoV nsp 6 is predicted by TMHMM2.0 
(Krogh et al., 2001 ) to contain seven transmembrane regions, only 
six of these function as membrane-spanning helices. The pres¬ 
ence of additional non-transmembrane hydrophobic domains near 
authentic transmembrane domains is a common theme running 
through the DMV making proteins nsp3, nsp4 and nsp 6 . IBV and 
SARS-CoV nsp 6 have been shown to activate autophagy, induc¬ 
ing vesicles containing Atg5 and LC3-II (Cottam et al., 2011 ). MHV 
Nsp 6 is relocalized when it is co-expressed with nsp4 (Hagemeijer 
et al., 2012), suggesting that the two proteins interact. Nsp 6 has 
also been shown to interact with nsp2, nsp 8 , nsp9 and sars9b via 
yeast two-hybrid assays (von Brunn et al., 2007). 

3.33. Nsp7and Nsp8 

See Box 4. 

3.33.3. Structure 

Nsp7 and nsp 8 are two mature proteins that emerge due 
to cleavage of polyprotein Polyprotein la at 3834 TVQTSKM 3839 


Please cite this article in press as: Neuman, B.W., et al., Atlas of coronavirus replicase structure. Virus Res. (2013), 

http://dx.doi.org/! 0.1016/j.virusres.2013.12.004 













788 

789 

790 

791 

792 

793 

794 

795 

796 

797 

798 

799 

800 

801 

802 

803 

804 

805 

806 

807 

808 

809 

810 

811 

812 

813 

814 

815 

816 

817 

818 

819 

820 

821 

822 

823 

824 

825 

826 

827 

828 

829 

830 

831 

832 


833 

834 

835 

836 

837 

838 

839 

840 

841 

842 

843 

844 

845 

846 

847 

848 

849 

850 

851 

852 

853 

854 

855 

856 

857 

858 

859 

860 

861 

862 

863 

864 

865 

866 

867 

868 

869 

870 

871 

872 

873 

874 

875 

876 

877 

878 

879 

880 

881 

882 

883 

884 


G Model 

VIRUS961521-18 


ARTICLE IN PRESS 


B.W. Neuman et al. / Virus Research xxx (2013) xxx-xxx 


Box 4: Key nsp7 and nsp8 structures 


Virus 

Domain 

Method 

Accession 

Reference 

SARS-CoV 

nsp7 

NMR 

1YSY 

Peti et al. (2005) 

SARS-CoV 

nsp7 + nsp8 

X-ray (2.6 A) 

2AHM 

Zhai et al. (2005) 

FCoV 

nsp7 + nsp8 

X-ray (2.4 A) 

3UB0 

Xiao et al. (2012) 


(nsp6/7), 39i7TLQ.|AIA 3 g 2 2 (nsp7/8) and 4115 KLCU.NNE 41 2 o (nsp8/9) 
boundaries. SARS-CoV Nsp7 and nsp8 self associate and form a large 
sixteen-subunit supercomplex that has been directly implicated in 
replication (Bartlam et al., 2005; Zhai et al., 2005). 

The structure of nsp7 was first determined by NMR (Peti 
et al., 2005) which revealed the presence of a four helical bun¬ 
dle arranged in a novel sheet-like arrangement, with three of the 
helices arranged anti-parallel to each other while the fourth ori¬ 
ented at an angle to the bundle. Much of the structure derived 
functional information about nsp7 and nsp8 came from the study 
by Rao and co-workers (Bartlam et al., 2005; Zhai et al., 2005) 
who determined the 2.4 A resolution crystal structure of the nsp7/8 
supercomplex (Fig. 5). Eight subunits of nsp7 and nsp8 each form a 
tight hexadecameric complex. In this complex, nsp7 reveals a ter¬ 
tiary structure that is similar to its solution structure with the minor 
deviation in that the fourth helix is oriented at a slightly different 
angle and is more ordered. This is possibly due to its existence as 
a complex with nsp8 in crystals. SARS-CoV Nsp8 adopts two major 
conformations described as the “golf club" and the “bent golf club" 
fold, which has an extended long shaft domain with three helices 
(one of which is very long) and a globular core at the C terminus 
(Zhai et al., 2005). 

The supercomplex, which is formed by a stoichiometric associa¬ 
tion of eight subunits each of nsp7 and nsp8, is a hollow cylindrical 
structure with a central channel, and two handles (one on either 
side of the structure) has a very distinct bimodal distribution 
of electrostatic charge on its surface in which the outer skin of 
the complex is composed of predominantly negatively charged 
residues while the inner core channel is lined with positively 
charged sidechains. RNA binding studies using gel mobility shift 
assays suggest that the function of the central positively charged 
channelis to preferentially guide dsRNA through the supercomplex 
either towards the polymerase (nspl 2) or away from it during repli¬ 
cation. Mutagenesis experiments indicate that residues R26 and 
I<32 of nsp7 and K77, R80, K63, R84 and R85 are among those that 
line the channel and are primarily responsible for this translocation 
(Zhai et al., 2005). 

The FCoV nsp7 and nsp8 proteins were recently shown to adopt 
similar structures to the SARS-CoV equivalents (Fig. 5), but in a dis¬ 
tinctive 2:1 protein complex (Xiao et al., 2012 ). No known homologs 
exist for either of these proteins outside of coronaviridae lineage 
within statistical limits of significance. 

SCOP places nsp7 as a member of the “immunoglobulin/albumin¬ 
binding domain-like ” fold, with three of its helices arranged as a 
bundle and having an overall topology that mirrors spectrin-like 


fold. The globular core domain of Nsp8 golf-club has been defined 
as a new fold. 


3.13.2. Function 

The most striking functional insight on the supercomplex 
has been obtained by Canard and co-workers who have shown 
that coronaviral nsp8 encodes a second non-canonical RNA poly¬ 
merase activity (Imbert et al., 2006). This template-dependent 
oligonucleotide-synthesizing activity, which is dependent on Mn 2+ 
or Mg 2+ cations, was found to be preferentially enhanced by inter¬ 
nal 5'-(G/U)CC-3' trinucleotides that are present on RNA templates 
and were used to initiate the synthesis of complementary oligonu¬ 
cleotides. Typical extension products were found to be <6 residues 
long. Nsp8 effectively polymerized poly(rC) and oligo(rC 15 ) tem¬ 
plates and poly(rU) to a weaker extent but not poly(rA). This 
accessory polymerase, which is both catalytically weaker and has 
a lesser fidelity than the main viral RdRp (nspl2), was potently 
inhibited with 3'-dGTP and to a lesser extent by ddGTP and 
2'-0-methyl-GTP suggesting an avenue for possible therapeutic 
inhibition. The primase activity of nsp8 was blocked by N-terminal 
extension of nsp8 with peptides other than nsp7 (te Velthuis et al., 
2012 ). 

Initial mutagenesis experiments on nsp8 implicated four 
residues K58, R75, K82 and S85 to be essential for polymerization, 
but a more recent study also identified a magnesium ion bind¬ 
ing site at D50 and D52 corresponding to functional a D/E-x-D/E 
motif (te Velthuis et al., 2012). All of these residues localize on 
the long a-helix (the stem of the golf club) and map onto one 
of several dimer interface regions of the supercomplex. The main 
function of nsp8 overall appears to be to catalyze the synthe¬ 
sis of short stretches of RNA primers that can be utilized by 
the primer-dependent main SARS polymerase nspl2. Using dual 
labeling immunofluorescence microscopic studies, Prentice and co¬ 
workers have shown that nsp8 co-localizes along with nsp2 and 
nsp3 in cytoplasmic complexes, which also contain the protein LC3, 
which is a general marker for autophagic vacuole (Prentice et al., 
2004). 

Astriking observation by Masters and co-workers (Zust et al., 
2007b) provided compelling evidence that nsp8 specifically inter¬ 
acts with a molecular switch composed of a bulged stem-loop and 
an RNA pseudoknot that exists in the 3D untranslated region of 
MHV, indeed most coronaviral genomes. These studies are aiding 
in developing a model that explains the origins and initiation of 
negative strand genomic RNA synthesis (te Velthuis et al., 2012). 

Yeast two-hybrid screening and co-immunoprecipitation 
experiments, which were subsequently confirmed by in-vivo co¬ 
localization studies by Lai and co-workers have shown that nsp8 
interacts with sars6 gene product as well, thereby implicating this 
accessory protein in the replication complex (Kumar et al., 2007). 
In a proteome-wide yeast two-hybrid screening study nsp8 was 
found to be one of the most promiscuous non-structural protein, 
which interacted with no less than 13 out of 29 SARS proteins 
tested (von Brunn et al., 2007). 


Table 1 

Cleavage sites of SARS M pro using Tor2 as the reference SARS strain. 



...P3,P2, PUP-1, P-2JP-3.... 


...P3,P2, PUP-1, P-2, P-3... 

Nsp4/5 

3238VLCUSGF3243 

Nspii /12 

4367 LMQ 4 .SAD 4372 

Nsp5/6 

3544TFCU GKF 3549 

Nspl2/13 

5299 VLQ 4 .AVG 5304 

Nsp6/7 

3834TVQJ,SKM3839 

Nspl3/14 

5900 TLQIAEN 5905 

Nsp7/8 

3917 TLCUAIA3922 

Nspl4/15 

6427RLQ4-SLE6432 

Nsp8/9 

Nsp9/10 

4 H 5 KLQ. 4 -NNE 4120 

4228RLQ4.AGN4233 

Nspl5/16 

6773 KLQJ, ASQ 6778 
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Fig. 5. Structures ofthecoronavirus replicase proteins nsp7, nsp8 and nsp9. Structures for the nsp7 and nsp8 heterodimers fromSARS-CoV (2AHM; panel A) and FCoV (3UB0; 
panel B) are shown to illustrate the distinctive structures of these proteins. The structure of homodimeric SARS-CoV nsp9 is taken from PDB entry 1QZ8 and is shown in panel 
C. 


3.14. Nsp9 

See Box 5. 

3.14.1. Structure 

Two groups have independently determined the structure of 
nsp9 (Egloff et al., 2004; Sutton et al., 2004). It adopts a (3-barrel 
fold with a C-terminal a-helix (Fig. 5). The structure of HCoV-229E 
nsp9 was subsequently solved, and found to contain a similar fold 
(Ponnusamy et al., 2008). Nsp9 of both viruses was found to be 
dimeric, although the dimer interface was stabilized by an addi¬ 
tional disulfide bond in HCoV-229E nsp9. 

3.14.2. Function 

Nsp9 binds ssRNA and dsDNA in a concentration dependent 
manner (Egloff et al., 2004; Sutton et al., 2004). Optimal bind¬ 
ing occurs with 45-mer oligonucleotides, consistent with binding 
occurring by the nsp9 dimer wrapping the DNA fragment around 
itself once (Egloff et al., 2004). Since RNA-binding is not sequence 
specific, nsp9 may protect nascent ssRNA from nucleases during 
viral RNA synthesis, given its natural abundance in the infected 
cell (Egloff et al., 2004). Nsp9 colocalizes in the perinuclear region 
along with other components of the replication complex (Bost et al., 
2000). While the precise role of nsp9 in viral replication is not yet 
clear, Minkis and co-workers investigated the role of the dimer 
interface demonstrated that SARS-CoV nsp9 is essential for efficient 
viral growth (Miknis et al., 2009). 

3.15. Nspl0-11 
3.15.1. Structure 

The tenth coronavirus nonstructural protein constitutes the 
carboxyl-terminal conserved domain of replicase polyprotein 
polyprotein la, and a region of this polyprotein homologous to 
nsplO can be readily identified in all coronaviruses (Joseph et al., 
2006). Originally described as a growth factor-like protein on the 
basis of high cysteine content and sequence homology (Gorbalenya 
et al., 1989), nsplO is a highly conserved component of the corona- 
virus replicase machinery. Reciprocal BLAST searches using various 
nsplO homologs identify a region of homology near the carboxyl 
terminus of polyprotein la in more distantly related nidoviruses 
such as torovirus and the newly identified white bream virus 
(Schutze et al., 2006). However, no region of nsplO homology has 


Box 5: 

fey nsp9, nsplO and nspll structures 

Virus 

Domain 

Method 

Accession 

Reference 

HCoV-229E 

nsp9 

NMR 

2J97 

Ponnusamy et al. (2008) 

SARS-CoV 

nsp9 

X-ray (2.6 A) 

1QZ8 

Egloff et al. (2004) 

SARS-CoV 

nsplO 

X-ray (1.8 A) 

2FYG 

Joseph et al. (2006) 

SARS-CoV 

nsp10 + nspll 

X-ray (2.1 A) 

2G9T 

Su et al. (2006) 


been noted to date in the ronivirus or arterivirus lineages of the 
Nidovirales. Bioinformatic analysis of nsplO does not yield any 
consistent matches to conserved enzymatic signatures. Thus, the 
profile of nsplO more likely fits a role as an auxiliary replicase 
component, rather than an essential replicase enzyme. 

X-ray crystallography structures (Bhardwaj et al., 2006; Joseph 
et al., 2006) have revealed that nsplO is a single domain pro¬ 
tein consisting of a pair of antiparallel N-terminal helices stacked 
against an irregular (3-sheet, a coil-rich C terminus, and two Zn fin¬ 
gers. As such, nsplO represents a novel fold, as might be expected 
from the lack of protein or domain homology to other known pro¬ 
teins. Bacterially expressed nsplO binds generic single-stranded 
and double-stranded nucleic acids with micromolar affinity (Joseph 
et al., 2006). 

Within the polyprotein, coronavirus nsplO is followed by a short 
peptide of highly variable sequence that maps to the region of the 
genomic RNA where the ribosomal frameshift signal leading to the 
translation of the replicase enzyme cluster in open reading frame 
lb is located. In SARS-CoV, nspll is a 13-residue peptide which 
can theoretically be processed from the C-terminus of polyprotein 
la, however processing of nspll has not been demonstrated in 
infected cells. The structure of the uncleaved nspl 0-11 polypeptide 
showed some differences in oligomerization and crystal packing, 
but little difference in the core nsplO structure (Bhardwaj et al., 

2006) . In that study the nspll density was flexibly disordered 
(Bhardwaj et al., 2006). Thus, nspll more likely forms part of an 
essential translation reading frame shift mechanism, and is unlikely 
to significantly influence the function of nsplO. Synthesized nspll 
peptide is fairly insoluble in aqueous buffers (J. Joseph, unpublished 
data). 

3.15.2. Function 

The first assignment of a function to nsplO was noted from a 
study of MHV strains that contained temperature-sensitive lesions 
affecting viral RNA synthesis (Sawicki et al., 2005). It was further 
noted that this defect in nsplO could not be compensated in cells by 
co-infection with viruses harboring temperature-sensitive lesions 
in nsp4 or nsp5, suggesting that coronavirus polyprotein 1 a (at least 
from nsp4 onward) forms a single functional unit important for 
coronavirus discontinuous negative-strand RNA synthesis (Sawicki 
et al., 2005). Mutagenesis studies have confirmed the importance 
of nsplO for general RNA synthesis and for controlling the ratio of 
subgenomic to genomic RNA (Donaldson et al., 2007b). Deletion 
of nsplO or rearrangement of the genes encoding nsp7-10 com¬ 
pletely inhibited virus growth, while alteration of the M pro cleavage 
site between nsp9 and nsplO reduced viral growth (Deming et al., 

2007) . An unexpected finding was that the temperature sensitive 
lesion in nsplO correlates with a severe inhibition of M pro activity 
at the non-permissive temperature (Donaldson et al., 2007a). From 
these results it appears clear that the function of nsplO is closely 
tied to viral RNA synthesis. NsplO is now known to form part of the 
viral mRNA cap methylation complex (Bouvet et al., 2010) which is 
discussed below with the viral methyltransferase subunits. 
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3.16. Nspl 2 

3.16.1. Structure 

Nspl2 is the cleavage product of the replicase polyprotein 
polyprotein lab and is produced by the action of M pro . It is 932 
residues long in the SARS-CoV. Immunoblotting and immunofluo¬ 
rescence analyses indicated that full length, 106-kDa, RdRp protein 
is present in infected Vero cells and is part of the viral replication 
cycle of (Prentice et al., 2004). 

Cheng et al. (Cheng et al., 2005) expressed nspl2 in E. coli and 
found that during purification, the protein was cleaved into three 
stable fragments (1-110, 111-368, 369-932), which may corre¬ 
spond to separate domains. 

Nspl 2 has high sequence identity with other coronavirus RdRps, 
but very low similarity to other viral polymerases. Based on man¬ 
ual sequence alignments with other the RdRps of poliovirus, rabbit 
hemorrhagic disease virus, hepatitis C virus, reovirus and bacterio¬ 
phage 4>6 polymerases, as well as HIV-1 reverse transcriptase, Xu 
et al. (2003) were able to identify conserved sequence motifs and, 
by homology, to assign functions to these regions in the catalytic 
domain of the polymerase. 

Xu et al. (2003) built a three-dimensional model of the catalytic 
domain of nspl2 (PDB ID 105S), based on alignments of conserved 
motifs with other viral polymerase proteins. Based on the model of 
SARS-CoV nspl2, the catalytic domain forms the canonical “palm 
and fingers” domain. The fingers subdomain is predicted to span 
residues 376-584 and 626-679 and is predicted to consist of a- 
helices in the base and [3-strands and coils at the tip (Xu et al., 
2003). Similar to the HCV and RHDV RdRps, its fingers subdo¬ 
main also contains an N-terminal portion (residues 405-444) that 
forms a long loop starting from the fingertip that bridges the fin¬ 
gers and thumb subdomains. The palm subdomain of SARS-CoV 
nspl2 (residues 585-625 and 680-807) forms the catalytic core 
and contains the four highly conserved sequence motifs (A-D) 
found in all polymerases and a fifth motif (E) unique to RdRps 
and RTs (Poch et al., 1989). The core structure of the palm subdo¬ 
main is well conserved across all classes of polymerases. It consists 
of a central three-stranded (3-sheet flanked by two a-helices on 
one side and a (3-sheet and an a-helix on the other. Residues 
forming the catalytic active site are found within motifs A and 
C. 

The structures of RdRps of a few non-CoVs have been deter¬ 
mined - for example, those of hepatitis C virus, poliovirus, rabbit 
hemorrhagic disease virus, reovirus, bacteriophage 4>6 and HIV-1 
(see Xu et al., 2003 ). However, there is very low sequence similarity 
between these structures and CoV RdRps. Xu et al. (2003) con¬ 
structed a comparative molecular model for SARS-CoV RdRp based 
on these structures, using manual sequence alignments anchored 
by conserved sequence motifs shared by all RdRps and reverse tran¬ 
scriptases. 

3.16.2. Function 

The RdRp is the central enzyme in the multi-component viral 
replicase complex that replicates the viral RNA genome (Bost et al., 
2000; Brockway et al., 2003) would contain several other viral pro¬ 
teins as well. The replicase transcribes (i) full-length negative and 
positive strand RNAs; (ii) a 3'-co-terminal set of nested subge- 
nomic mRNAs that have a common 5' ‘leader’ sequence derived 
from the 5' end of the genome; and (iii) subgenomic negative 
strand RNAs with common 5' ends and leader complementary 
sequences at their 3' ends (Lai, 2001; Thiel et al., 2003). Full- 
length nspl2 has RdRp activity. The “catalytic” 64kDa domain 
and the N-terminal 12kDa domain form a complex that pos¬ 
sesses comparable RdRp activity. However, the 64 kDa domain in 
isolation has no activity. Cheng and coworkers suggest that the N- 
terminal domain is required for polymerase activity possibly via 


involvement in template-primer binding (Cheng et al., 2005). Sni- 
jder and co-workers were able to confirm that the full-length nspl2 
has robust, primer-dependent RNA polymerase activity (te Velthuis 
et al., 2010), a finding generally confirmed by the later study of Ahn 

et al. (2012). 

There has been some success in the use of inhibitors of viral 
polymerases as therapeutics. Hence, the RdRp is an attractive drug 
target. Lu et al. used a short RNAi targeting the RdRp and found that 
it significantly reduced plaque formation of SARS-CoV in Vero-E6 
cells (Brockway et al., 2004). However, such an approach would 
affect expression of the entire polyprotein la/ab and would not be 
specific to the RdRp. He et al. (2004) showed that aurintricarboxylic 
acid could potently reduce viral titer by more than 1000-fold when 
added to cells in culture. The same group subsequently suggested, 
by analogy to other RNA polymerases, that this compound may 
act on nspl2, and performed docking studies to predict the site of 
binding (Yap et al., 2005). 

The main RdRp would be predicted to interact either directly or 
indirectly with several other viral proteins, including the nsp3-6 
scaffold proteins, the nsplO-16 methylation complex and the 
nsp 7-8 primase. Adedeji and coworkers showed that SARS-CoV 
nspl2 enhances the helicase activity of nspl3 by two-fold (Adedeji 
et al., 2012). Nspl2 interacts with nsp8, nspl3, sars3a and sars9b 
according to yeast two-hybrid experiments and with nsp8 by 
co-immunoprecipitation experiments. Previous immunolocaliza- 
tion and interaction studies in MHV have also indicated that 
in vivo, nspl2 may act in concert with numerous other viral 
proteins - counterparts of SARS nspl, 2, 5, 8, 9, 13 and sars9a 
(Bost et al., 2000; Brockway et al., 2003; von Brunn et al„ 
2007). 

3.17. Nspl3 

3.17.1. Structure and function 

Nspl3 is a helicase capable of unwinding both RNA and DNA 
duplexes in a 5’-to-3’ direction with high processivity( Ivanov et al., 
2004; Tanner et al., 2003). It possesses deoxynucleoside triphos¬ 
phatase (dNTPase) activity against all standard nucleotides and 
deoxynucleotides, and also RNA 5'-triphosphatase activity which 
may be involved in the first step of formation of the 5' cap structure 
of the viral mRNAs (Ivanov et al„ 2004; Tanner et al„ 2003). The two 
hydrolase activities likely have a common active site, which con¬ 
tains a canonical Walker A NTPase-like motif (Ivanov et al., 2004). 
Since NTPase/helicase proteins are considered essential for viral 
viability (Kadare and Haenni, 1997), they are potential drug tar¬ 
gets (Anand et al., 2003; Holmes, 2003). Promising inhibitors are 
in trials for herpes simplex virus (Kleymann, 2003) and hepatitis 
C viral infections (Borowski et al., 2002). Several SARS-CoV heli¬ 
case inhibitors - bananin derivatives - have been identified (Tanner 
et al., 2005). 

While the structure of nspl3 has not yet been determined, 
the protein has been modeled based on the E. coli Rep ATP- 
dependent DNA helicase (PDB accession 1UAA). The model of the 
helicase domain at the position 80-568 of SARS-CoV nspl3 has 
been deposited (PDB accession 2G1F; Bernini et al., 2006). The 
N-terminus of nspl3 contains conserved cysteine and histidine 
residues that are probably homologous with the metal binding 
domains at the N-terminus of arterivirus helicases, which coordi¬ 
nate up to four Zn 2+ (van Dinten et al., 2000). 

IBV Nspl3 also has a proposed role in modulating the host 
response, although it is not yet clear whether this role is conserved 
in other coronaviruses (Xu et al., 2011). Overexpression of nspl3 
led to cell cycle arrest by interfering with DNA polymerase delta, 
though the report did not determine whether this effect occurs 
normally during viral infection. 
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3.18. Nspl4 

3.18.1. Structure 

SARS-CoV nspl4 is 527 residues long and is multifunctional. The 
structure of nspl4 has not yet been solved. All coronaviruses have 
homologues of nspl4, all containing an N-terminal domain with 3' 
to 5' exonuclease motifs I (DE), 11(D) and 111 (D) within the first ~280 
residues of the protein (Moser et al., 1997; Thiel et al., 2003; Zuo 
and Deutscher, 2001) and a C-terminal cap N7-methyltransferase 
domain (Chen et al., 2013; Minskaia et al., 2006). Compared to other 
RNA ExoNs, CoV and torovirus ExoNs have an additional putative 
Zn finger between Exo 1 and II motifs (Thiel et al., 2003). 

The coronaviral N7-methyltransferase is unusual in that it 
is physically and functionally linked with the exoribonuclease 
domain (Chen et al., 2013). Most of the residues known to be 
essential for methylation are located around the sequence motif 
DxGxPxAat positions 331-338 of nspl4, which is predicted to form 
the S-adenosyl-L-methionine binding pocket of the methyltrans- 
ferase domain. 

3.18.2. Function 

Arteriviruses, which are related to nidoviruses except that they 
are about two-fold smaller, do not have an ExoN homologue or a 
homologue of either of the viral methyltransferases. This seems to 
indicate that this enzyme in CoVs is required for stable synthesis of 
exceptionally large RNA templates (Minskaia et al., 2006). 

SARS-CoV nspl4 has 3'-x5' exonuclease activity on both ssRNA 
and dsRNA (Minskaia et al., 2006). Recombinant nspl 4 (as a maltose 
binding protein fusion) hydrolyzed ssRNA to a ~ 12 nucleotide prod¬ 
uct. When the DE-D-D residues are substituted with alanine, this 
activity was abolished or greatly impaired; D90A/E92A andH268A 
mutants had very low activity while N238A, D243A and D273A 
had undetectable activity (Minskaia et al., 2006). DsRNA also sig¬ 
nificantly enhanced exonuclease activity of the enzyme (Minskaia 
et al., 2006). DNA and ribose-2'-0-methylated RNA are resistant 
to cleavage (Minskaia et al., 2006). The activity of nspl4 is strictly 
dependent on divalent cations (Chen et al., 2007a). Its activity was 
highest in the presence of Mg 2+ or Mn 2+ , lower in the presence 
of low amounts of Zn 2+ (0.5 mM) and undetectable with Ca 2+ or 
higher concentrations of Zn 2+ . With Mn 2+ , the size of the product 
is slightly smaller than that obtained with Mg 2+ or Zn 2+ , indicating 
that the metal ions may modulate the configuration of the active 
site differently (Chen et al., 2007a; Minskaia et al., 2006). 

In MHV, nspl4 greatly enhances replication fidelity, essential 
for the replication and stability of the unusually large CoV genome 
(Eckerle et al., 2007). Recombinant viruses with mutations in the 
nspl4 active site were defective in growth and RNA synthesis 
and possessed 15-fold more mutations than wild-type viruses. 
Nspl4 therefore appears to play a role in error prevention or repair 
of nucleotide incorporation during RNA synthesis (Eckerle et al., 
2007). Recombinant HCoV-229E containing mutations in the active 
site of nspl 4 had severe defects in RNA synthesis and no viable virus 
could be recovered. Besides strongly reduced genome replication, 
specific defects in sg RNA synthesis, such as aberrant sizes of spe¬ 
cific sg RNAs and changes in the molar ratios between individual 
sg RNA species, were observed (Minskaia et al., 2006). Sperry et al. 
(Eckerle et al., 2006; Sperry et al., 2005) have shown that a Tyr->His 
mutation (equivalent to SARS-nspl4 Tyr420His) in an infectious 
clone of MHV-A59 shows attenuated virus replication and viru¬ 
lence in mice, also arguing for the importance of this protein as a 
proof-reading component of the viral replication machinery. Based 
on temperature-sensitive mutants of MHV, Sawicki et al. (2005) 
showed that nspl 4 is essential for the assembly of a functional 
replicase-transcriptase complex and appears to affect the positive- 
strand synthesis, as would be expected for a protein involved in 
both capping and mismatch repair. 


Box 6: Key nsp15 and 16 structures 

Virus Domain Method Accession Reference 

MHV nspl5 X-ray (2.7 A) 2GTH Xu et al. (2006) 

SARS-CoV nspl5 X-ray (2.6 A) 2H85 Ricagno et al. (2006b) 

SARS-CoV nsp10 + nsp16 X-ray (2. 0A) 2XYQ Decroly et al. (2011) 


Nspl4 interacts with nsplO and nspl 6 to form the viral cap 
methylation complex, as described in more detail under nspl 6 
below. Y2H and co-immunoprecipitation studies suggest that 
nspl4 may also interact with nsp 8 and sars9b (von Brunn et al., 
2007). 

3.19. Nspl5 

See Box 6 . 

3.19.1. Structure 

Nspl 5 of SARS-CoV is a 346-residue polypeptide that results 
from the cleavage of polyprotein lab at sites 6427 RLQTSLE 6432 and 
6773 KLQTASQB 778 by M pro . It is one of the most well studied RNA 
processing enzyme of the coronaviral replicase with several recent 
studies focusing on its structural and functional characterization 
due to its potential importance as a drug target. Studies on HCoV- 
229E and equine arteritis virus have shown that inactivating this 
enzyme by site-directed mutagenesis renders these viruses non- 
viable. This enzyme is a specific marker for coronaviruses as no 
known homologs of nspl 5 exists among other RNA viruses outside 
of nidovirales. Nspl 5 preferentially cleaves the 3' end of uridy- 
lates of RNA at GUU or GU sequences to produce molecules with 
2'-3' cyclic phosphate ends (Bhardwaj et al., 2004). It acts on both 
double-stranded RNA and single-stranded RNA (ssRNA) and its 
activity is dependent on the presence of Mn 2+ ions (Bhardwaj et al., 
2004; Guarino et al., 2005). The ion binds only weakly but nonethe¬ 
less produces substantial conformational changes in the active site 
loops (Bhardwaj et al., 2004; Bhardwaj et al., 2006). 

Several groups have characterized the structure of nspl 5, both 
by cryoEM (Guarino et al., 2005) and X-ray crystallography from 
SARS-CoV (Bhardwaj et al., 2007; Guarino et al., 2005; Joseph 
et al., 2007; Ricagno et al., 2006a,b; Xu et al., 2006), and MHV (Xu 
et al., 2006) and its eukaryotic homolog, Xendoll from Xenopus Iae- 
vis (Renzi et al., 2006). The coronaviral structures have revealed 
a three-domain architecture (Fig. 6 ). Again not surprisingly, the 
catalytic C-terminal domain contains a novel fold. The first two 
domains (residues 1-190) have a topological similarity to methyl¬ 
transferases forming a ‘spitting image’ of the SAM-dependent 
methyltransferase fold as defined in SCOP database (Murzin A, 
personal communication; Fig. 6 ). The full length MHV and SARS 
nspl 5 enzymes were shown to be packed as hexamers, their bio¬ 
logically relevant oligomeric state, forming a hollow, toroid shaped 
structure. Hexamerization is absolutely essential for both metal ion 
binding and catalytic activity (Guarino et al., 2005). The eukaryotic 
homolog Xendoll fromX. levis is much shorter (missing the first two 
domains) and shares only the conserved catalytic domain. In fact, 
it was the first structure of this endoribonuclease fold to be struc¬ 
turally characterized. It is a functional monomer in solution. The 
catalytic center of nspl 5 retains features that resemble the active 
site of an unrelated nuclease, RNase A (Cuchillo et al., 2011 ). 

3.19.2. Function 

While the enzymatic activity of nspl5 is now fairly well 
understood, the role of nspl 5 in the coronavirus replication 
cycle is not. Nspl 5 cleaves at uridylates preceded by cytidy- 
late or adenylate residues. When model RNA substrates were 
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Fig. 6. Structures of the replicase proteins nspl 0,15 and 16. The hexameric structure of SARS-CoV nspl 5 comes from PDB entry 2H85 and is shown in panel A. The structure 
of the SARS-CoV nsplO-16 shown in panel B complex is taken from PDB entry 2XYQ.(nsplO is shown on the left). 


2 '-0-ribose methylated, it blocked endonucleolytic activity, indi¬ 
cating a possible functional link between nspl 5 and the 2'-0-ribose 
methyltransferase nspl 6. A structure based catalytic mechanism of 
endonucleolytic activity of nspl 5 was proposed by Ricagno et al. 
(2006b), based on active site similarities with RNAse A. In this 
model, Lys-289, His-249, and His-234 residues act as the main cat¬ 
alytic triad while Ser-293 and Tyr-342 provide the supporting role 
by stabilizing the aromatic ring of the nucleotide. Despite structural 
uniqueness of nspl5, the actual mode of sessile bond cleavage is 
thought to be very similar to those of several RNAses of this class 
that share the same catalytic triad e.g., RNAse Tl, RNAse A and oth¬ 
ers. The six active sites of the hexamer are spatially segregated and 
are thought to function independent of one another. The actual 
electrostatic contribution of the Mn 2+ ion in catalysis is unclear. 
In XendoU, Mn 2+ does not impede either RNA substrate binding of 
cleavage (Renzi et al., 2006). However, fluorescence experiments 
indicate that upon metal ion binding, the protein undergoes large 
structural transitions suggesting an indirect, possibly structural 
role for metal in either stabilizing the enzyme in a catalytically com¬ 
petent “on” state from an otherwise inactive “off’ state (Renzi et al., 

2006) . Drawing analogies with endonuclease EndN, a Zn dependent 
enzyme, Ricagno and co-workers have hypothesized that, given its 
proximity to the catalytic site, Y342 might be the residue involved 
in Mn2+ ion binding by forming a cation-II interactions, assisted 
by the histidine H249 and the 2'-0-ribose moiety of the substrate. 
Molecular modeling and docking studies led Renzi et al. (2006) to 
propose a similar mechanism for endonucleolytic cleavage by the 
eukaryotic homolog XendoU, wherein the 04 of UMP nucleotide 
forms a potential hydrogen bond with the catalytic histidine HI78 
and the pyrimidine ring of the nucleotide involved in stacking inter¬ 
action with the aromatic ring of tyrosine Y280. 

The structure of a truncated form of nspl5 from SARS-CoV, 
that was lacking the N terminal hexamerization domain, revealed 
striking changes in the active site loops in the catalytic domain - 
suggesting allosteric control of endonucleolytic activity and provid¬ 
ing a direct link between oligomerization and function (Joseph et al., 

2007) . In this structure, which lacked the first 27 amino acids of 
nspl 5, a dramatic shift was noticed in the active site loop (residues 
234-249, referred to as the “active site loop" spanning the two 
active site histidines H234 and H249) that was flipped by as much 
as ~120° into the active site cleft. In the full-length nspl 5 hexamer, 
the “active site loop” and the “supporting loop” are packed against 
each other and are stabilized by intimate interactions with residues 
contributed by the adjacent monomer. 

3.20. Nspl6 
3.20.1. Structure 

nspl6 lies at the C-terminal end of polyprotein Polyprotein 
lab and results when M pro cleaves the polyprotein at nsl!5/16 


junction (Snijder et al., 2003). Although first identified in flavi- and 
reoviruses (Koonin, 1993) about two decades ago, the role of viral 
methyltransferases in viral replication has been only now begun to 
be explored systematically (Gorbalenya et al., 2006). After rigorous 
sequence and structure analysis using 3D-jury based metaserver 
prediction methods, Richlewski and co-workers noticed a strong 
but remote homology between SARS nspl 6 and an ancient family 
of S-adenosyl-L-methionine (SAM) dependent 2'-0-ribose methyl¬ 
transferases enzymes (von Grotthuss et al„ 2003; Ferron et al., 
2002). The sequence of SARS MTase has features that place it 
in the viscinity of the RrmJ/fibrillarin superfamily of 2'-0-ribose 
methyltransferases (Feder et al., 2003). 

The structure of SARS-CoV nspl 6 has been determined as part 
of an nspl6-nspl0 complex by two groups independently (Chen 
et al., 2011; Decroly et al., 2011). Nspl 6 adopts a canonical S- 
adenosyl-l-methionine dependent methyltransferase fold, with a 
central beta sheet framed by a helical clamp and a conserved 
catalytic KDKE tetrad (Martin and McMillan, 2002). The nspl6 
topology matches those of the dengue virus NS5 methyltransferase 
(Egloff et al., 2002) and vaccinia virus VP39 O-methyltransferase 
(Hodel et al., 1996). The structure of the nspl6/nspl0 interac¬ 
tion interface shows that nsplO interacts with and probably helps 
to stabilize the S-adenosyl-l-methionine binding pocket. This has 
the effect of making the putative RNA-binding groove of nspl6 
longer. The study by Decroly et al. (2011) also demonstrated 
that the methyltransferase inhibitor sinefungin interacts with the 
nspl 6 active site, and could therefore form the basis of a new 
generation of inhibitors that attack the coronavirus methylation 
process. The structure of nsplO was found to be virtually iden¬ 
tical when solved in the presence and absence of nspl6 (Chen 
et al., 2011; Decroly et al., 2011; Joseph et al., 2006; Su et al., 
2006). 

3.20.2. Function 

Nspl6 has been shown to interact with nsplO and nspl4 to 
form a viral cap methylation complex (Bouvet et al., 2010). All 
eukaryotic mRNAs posses this modified guanosine at the 5' termi¬ 
nus, a feature that confers protection against degradation by host 
nucleases. First reported in the early 70s (Gingras, 2009), the “cap” 
structure and has been found to be present in almost all eukaryotic 
viral RNAs. The generic nomenclature that’s been widely adopted 
is rn 7 G( 5, )ppp( 5, )X( m )pY( m ) where m 7 G corresponds to the modi¬ 
fied 7-methylguanosine nucleotide. O-methyltransferases such as 
nspl6 perform the final step of cap synthesis, which involves 
adding a methyl group to the first nucleotide following the m 7 G, 
and sometimes adding a methyl group at the same position on 
subsequent nucleotides. While the m 7 G cap is essential for effi¬ 
cient translation splicing, nuclear export, translation and stability of 
eukaryotic mRNA, O-methylation is not (Cougot et al., 2004; Lewis 
and Izaurralde, 1997; Schwer et al., 1998). 
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Zust et al. (2011) explored the role of nspl6 methylation in the 
viral replication cycle and found that O-methylation acts as a recog¬ 
nition marker that helps the host cell to recognize its own RNA 
species, and respond to incompletely methylated cap structures. 
Nspl6 makes replication possible by camoflauging newly synthe¬ 
sized viral RNA to resemble host mRNA, which therefore blocks 
the induction of an interferon response. This suggests that drugs 
that act on nspl6 have the potential to interfere with viral replica¬ 
tion both at the level of inhibition of the replication process, and 
in promoting intracellular recognition and response to viral RNA 
species. 

4. Conclusion - a Galapagos of new folds 

New coronavirus protein structures continue to have important 
implications for virus biology and antiviral design. An unexpected 
benefit of the coronavirus structure boom is a wealth of previously 
undiscovered protein folds. The protein data bank keeps records 
of the discovery of new protein structures over time, and cur¬ 
rently classifies all known protein structures into fewer than 1500 
folds. 

Since 2003, 18 out of 28 coronavirus proteins encompassing a 
total of 27 domains have been determined experimentally. Several 
of these have been described as “new folds" commonly defined as 
one with sufficiently different fold/topology based on comparison 
methods (DALI), fold classification schemes (CATH and SCOP) and 
family assignment schema of PFAM. By these criteria 16 out of the 
27 domains are indeed new folds - a striking rate of fold discovery, 
when compared to the ~10% for model pro- and eukaryotes being 
reported by structural genomics centers. 

Why do coronaviruses possess an abundance of new folds? 
One obvious reason might be that these structures have been 
relatively unexplored, and therefore under-represented in PDB. 
This is the first proteome-scale structural characterization of a 
coronavirus, and one with a disproportionately large number of 
singletons. The new folds are significantly contributed by the 
16 nonstructural proteins of the replicase machinery, several of 
which do not have counterparts outside Nidovirales. Ideally, new 
folds enable us to model sequence homologues, thereby filling 
out the immediate neighborhood in structure space. This is non¬ 
trivial for SARS-CoV proteins, since the new folds are (so far) 
either true sequence singletons (nspl, nsp2 nsp3a, sars9b) or 
found exclusively in Coronaviridae (nsp4, 7, 8, 9,10,15, spike RBD, 
sars9a). 

Fast mutation rates in viruses may encourage divergent samp¬ 
ling of fold space (Andreeva and Murzin, 2006). This, along with 
oligomerization has been proposed to be major facilitators of 
fold evolution (Andreeva and Murzin, 2006) allowing a protein to 
morph to a new fold analogous to structural drift (Krishna and 
Grishin, 2005) Elucidation of new folds, especially for isolated 
groups of divergent homologues should help in improving fold 
recognition and comparative modeling algorithms. 

These observations also have ramifications in evolution of new 
viral strains, a phenomenon which is the result of two antagonis¬ 
tic forces: greater adaptability within an ecological niche (because 
of intrinsically fast mutation rates) and increased evolutionary 
constraints due to their small genomes (Flolmes and Rambaut, 
2004). Overall, we are left with the piquant notion that proteins 
in viral proteomes may probably occupy a unique niche in fold 
space and coronaviruses, a peculiar island in this niche. Viruses 
are the most diverse biological entities on this planet and second 
only to prokaryotes in terms of sheer biomass. While the diversity 
of protein structures they represent certainly defies imagination, 
our understanding of protein folds and their migration in tertiary 
fold space may well be locked up in them. 


Appendix A. Supplementary data 

Supplementary data associated with this article can be 
found, in the online version, at http://dx.doi.org/10.1016/ 
j.virusres.2013.12.004. 
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